Overview

Dataset statistics

Number of variables25
Number of observations38471
Missing cells67074
Missing cells (%)7.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.3 MiB
Average record size in memory200.0 B

Variable types

Numeric8
DateTime2
Categorical14
Boolean1

Warnings

Returned has constant value "True" Constant
Customer ID has a high cardinality: 1587 distinct values High cardinality
Customer Name has a high cardinality: 795 distinct values High cardinality
City has a high cardinality: 3475 distinct values High cardinality
State has a high cardinality: 1072 distinct values High cardinality
Country has a high cardinality: 147 distinct values High cardinality
Product ID has a high cardinality: 9815 distinct values High cardinality
Product Name has a high cardinality: 3750 distinct values High cardinality
Sub-Category is highly correlated with Returned and 1 other fieldsHigh correlation
Region is highly correlated with ReturnedHigh correlation
Returned is highly correlated with Sub-Category and 6 other fieldsHigh correlation
Order Priority is highly correlated with ReturnedHigh correlation
Category is highly correlated with Sub-Category and 1 other fieldsHigh correlation
Ship Mode is highly correlated with ReturnedHigh correlation
Market is highly correlated with ReturnedHigh correlation
Segment is highly correlated with ReturnedHigh correlation
Postal Code has 30915 (80.4%) missing values Missing
Returned has 36159 (94.0%) missing values Missing
df_index is uniformly distributed Uniform
df_index has unique values Unique
Discount has 21767 (56.6%) zeros Zeros
Profit has 510 (1.3%) zeros Zeros
ship_delay has 1987 (5.2%) zeros Zeros

Reproduction

Analysis started2021-10-15 07:52:47.756270
Analysis finished2021-10-15 07:52:59.188527
Duration11.43 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct38471
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean25664.07925
Minimum0
Maximum51294
Zeros1
Zeros (%)< 0.1%
Memory size300.7 KiB
2021-10-15T09:53:00.169141image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2572.5
Q112835.5
median25602
Q338518.5
95-th percentile48757.5
Maximum51294
Range51294
Interquartile range (IQR)25683

Descriptive statistics

Standard deviation14811.31478
Coefficient of variation (CV)0.5771223908
Kurtosis-1.199828273
Mean25664.07925
Median Absolute Deviation (MAD)12841
Skewness0.00232785384
Sum987322793
Variance219375045.4
MonotocityNot monotonic
2021-10-15T09:53:00.272757image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
191641
 
< 0.1%
416971
 
< 0.1%
478421
 
< 0.1%
457951
 
< 0.1%
355561
 
< 0.1%
335091
 
< 0.1%
376071
 
< 0.1%
499011
 
< 0.1%
89451
 
< 0.1%
Other values (38461)38461
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
21
< 0.1%
31
< 0.1%
51
< 0.1%
81
< 0.1%
ValueCountFrequency (%)
512941
< 0.1%
512931
< 0.1%
512921
< 0.1%
512911
< 0.1%
512901
< 0.1%
Distinct1424
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
Minimum2011-01-01 00:00:00
Maximum2014-12-31 00:00:00
2021-10-15T09:53:00.373668image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:53:00.480843image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct1463
Distinct (%)3.8%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
Minimum2011-01-03 00:00:00
Maximum2015-01-07 00:00:00
2021-10-15T09:53:00.583089image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:53:00.688681image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Ship Mode
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
Standard Class
23148 
Second Class
7670 
First Class
5588 
Same Day
 
2065

Length

Max length14
Median length14
Mean length12.84344051
Min length8

Characters and Unicode

Total characters494100
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowStandard Class
2nd rowStandard Class
3rd rowSecond Class
4th rowSecond Class
5th rowFirst Class
ValueCountFrequency (%)
Standard Class23148
60.2%
Second Class7670
 
19.9%
First Class5588
 
14.5%
Same Day2065
 
5.4%
2021-10-15T09:53:00.853827image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-10-15T09:53:00.905426image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
class36406
47.3%
standard23148
30.1%
second7670
 
10.0%
first5588
 
7.3%
same2065
 
2.7%
day2065
 
2.7%

Most occurring characters

ValueCountFrequency (%)
a86832
17.6%
s78400
15.9%
d53966
10.9%
38471
7.8%
C36406
7.4%
l36406
7.4%
S32883
 
6.7%
n30818
 
6.2%
t28736
 
5.8%
r28736
 
5.8%
Other values (8)42446
8.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter378687
76.6%
Uppercase Letter76942
 
15.6%
Space Separator38471
 
7.8%

Most frequent character per category

ValueCountFrequency (%)
a86832
22.9%
s78400
20.7%
d53966
14.3%
l36406
9.6%
n30818
 
8.1%
t28736
 
7.6%
r28736
 
7.6%
e9735
 
2.6%
c7670
 
2.0%
o7670
 
2.0%
Other values (3)9718
 
2.6%
ValueCountFrequency (%)
C36406
47.3%
S32883
42.7%
F5588
 
7.3%
D2065
 
2.7%
ValueCountFrequency (%)
38471
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin455629
92.2%
Common38471
 
7.8%

Most frequent character per script

ValueCountFrequency (%)
a86832
19.1%
s78400
17.2%
d53966
11.8%
C36406
8.0%
l36406
8.0%
S32883
 
7.2%
n30818
 
6.8%
t28736
 
6.3%
r28736
 
6.3%
e9735
 
2.1%
Other values (7)32711
 
7.2%
ValueCountFrequency (%)
38471
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII494100
100.0%

Most frequent character per block

ValueCountFrequency (%)
a86832
17.6%
s78400
15.9%
d53966
10.9%
38471
7.8%
C36406
7.4%
l36406
7.4%
S32883
 
6.7%
n30818
 
6.2%
t28736
 
5.8%
r28736
 
5.8%
Other values (8)42446
8.6%

Customer ID
Categorical

HIGH CARDINALITY

Distinct1587
Distinct (%)4.1%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
PO-18850
 
79
CK-12205
 
71
BE-11335
 
70
ZC-21910
 
68
EM-13960
 
67
Other values (1582)
38116 

Length

Max length8
Median length8
Mean length7.815887292
Min length5

Characters and Unicode

Total characters300685
Distinct characters40
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)< 0.1%

Sample

1st rowSP-20860
2nd rowJD-15895
3rd rowAB-10600
4th rowGH-14410
5th rowKW-16435
ValueCountFrequency (%)
PO-1885079
 
0.2%
CK-1220571
 
0.2%
BE-1133570
 
0.2%
ZC-2191068
 
0.2%
EM-1396067
 
0.2%
JG-1580567
 
0.2%
BW-1111066
 
0.2%
SW-2075565
 
0.2%
WB-2185064
 
0.2%
MY-1829564
 
0.2%
Other values (1577)37790
98.2%
2021-10-15T09:53:01.081673image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
po-1885079
 
0.2%
ck-1220571
 
0.2%
be-1133570
 
0.2%
zc-2191068
 
0.2%
jg-1580567
 
0.2%
em-1396067
 
0.2%
bw-1111066
 
0.2%
sw-2075565
 
0.2%
mp-1796564
 
0.2%
my-1829564
 
0.2%
Other values (1577)37790
98.2%

Most occurring characters

ValueCountFrequency (%)
141114
13.7%
-38471
12.8%
032455
 
10.8%
529920
 
10.0%
216176
 
5.4%
811089
 
3.7%
611031
 
3.7%
711010
 
3.7%
310926
 
3.6%
410800
 
3.6%
Other values (30)87693
29.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number185272
61.6%
Uppercase Letter76797
25.5%
Dash Punctuation38471
 
12.8%
Lowercase Letter145
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
M6765
 
8.8%
C6660
 
8.7%
S6555
 
8.5%
B6339
 
8.3%
D4908
 
6.4%
J4649
 
6.1%
A4415
 
5.7%
H3920
 
5.1%
P3880
 
5.1%
R3647
 
4.7%
Other values (16)25059
32.6%
ValueCountFrequency (%)
141114
22.2%
032455
17.5%
529920
16.1%
216176
 
8.7%
811089
 
6.0%
611031
 
6.0%
711010
 
5.9%
310926
 
5.9%
410800
 
5.8%
910751
 
5.8%
ValueCountFrequency (%)
p57
39.3%
o54
37.2%
l34
23.4%
ValueCountFrequency (%)
-38471
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common223743
74.4%
Latin76942
 
25.6%

Most frequent character per script

ValueCountFrequency (%)
M6765
 
8.8%
C6660
 
8.7%
S6555
 
8.5%
B6339
 
8.2%
D4908
 
6.4%
J4649
 
6.0%
A4415
 
5.7%
H3920
 
5.1%
P3880
 
5.0%
R3647
 
4.7%
Other values (19)25204
32.8%
ValueCountFrequency (%)
141114
18.4%
-38471
17.2%
032455
14.5%
529920
13.4%
216176
 
7.2%
811089
 
5.0%
611031
 
4.9%
711010
 
4.9%
310926
 
4.9%
410800
 
4.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII300685
100.0%

Most frequent character per block

ValueCountFrequency (%)
141114
13.7%
-38471
12.8%
032455
 
10.8%
529920
 
10.0%
216176
 
5.4%
811089
 
3.7%
611031
 
3.7%
711010
 
3.7%
310926
 
3.6%
410800
 
3.6%
Other values (30)87693
29.2%

Customer Name
Categorical

HIGH CARDINALITY

Distinct795
Distinct (%)2.1%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
c1301105bfca084673e0bb8fa3103221
 
84
8fe3138a7ef91d7f8635f63b9d5331ad
 
83
f054cc2c916e6fd23d9afd7e4f101362
 
81
57a1a3a30c5c54262ba894270d3c3314
 
81
9d5201e7963b7f4c2136b5168dbd91f9
 
80
Other values (790)
38062 

Length

Max length32
Median length32
Mean length32
Min length32

Characters and Unicode

Total characters1231072
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa7d03c30d416fc5f7d695b495884fdd7
2nd row1b2850c124acd1bc24237b4b5228b65e
3rd row6acab08bb2b385c8569adfd24730ee01
4th row1528a0a296f3ecf500753855ea9a21a5
5th row648a7c6f93ee0f453ee1378466a84ff8
ValueCountFrequency (%)
c1301105bfca084673e0bb8fa310322184
 
0.2%
8fe3138a7ef91d7f8635f63b9d5331ad83
 
0.2%
f054cc2c916e6fd23d9afd7e4f10136281
 
0.2%
57a1a3a30c5c54262ba894270d3c331481
 
0.2%
9d5201e7963b7f4c2136b5168dbd91f980
 
0.2%
2d806890acc865414ad191e4f11ec62a77
 
0.2%
0e64857da6f1a22cf71a0bdefb9f2bbc74
 
0.2%
a9066b389900001da23e2dd934673faf74
 
0.2%
3e8c46cbd78f47c95668adf74cef15af74
 
0.2%
cdb986bad53909051244769475ad755f73
 
0.2%
Other values (785)37690
98.0%
2021-10-15T09:53:01.270267image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
c1301105bfca084673e0bb8fa310322184
 
0.2%
8fe3138a7ef91d7f8635f63b9d5331ad83
 
0.2%
f054cc2c916e6fd23d9afd7e4f10136281
 
0.2%
57a1a3a30c5c54262ba894270d3c331481
 
0.2%
9d5201e7963b7f4c2136b5168dbd91f980
 
0.2%
2d806890acc865414ad191e4f11ec62a77
 
0.2%
0e64857da6f1a22cf71a0bdefb9f2bbc74
 
0.2%
a9066b389900001da23e2dd934673faf74
 
0.2%
3e8c46cbd78f47c95668adf74cef15af74
 
0.2%
cdb986bad53909051244769475ad755f73
 
0.2%
Other values (785)37690
98.0%

Most occurring characters

ValueCountFrequency (%)
b81147
 
6.6%
d79935
 
6.5%
e79877
 
6.5%
979333
 
6.4%
478484
 
6.4%
078114
 
6.3%
177314
 
6.3%
f77062
 
6.3%
776657
 
6.2%
276536
 
6.2%
Other values (6)446613
36.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number764340
62.1%
Lowercase Letter466732
37.9%

Most frequent character per category

ValueCountFrequency (%)
979333
10.4%
478484
10.3%
078114
10.2%
177314
10.1%
776657
10.0%
276536
10.0%
375898
9.9%
874417
9.7%
574169
9.7%
673418
9.6%
ValueCountFrequency (%)
b81147
17.4%
d79935
17.1%
e79877
17.1%
f77062
16.5%
a74960
16.1%
c73751
15.8%

Most occurring scripts

ValueCountFrequency (%)
Common764340
62.1%
Latin466732
37.9%

Most frequent character per script

ValueCountFrequency (%)
979333
10.4%
478484
10.3%
078114
10.2%
177314
10.1%
776657
10.0%
276536
10.0%
375898
9.9%
874417
9.7%
574169
9.7%
673418
9.6%
ValueCountFrequency (%)
b81147
17.4%
d79935
17.1%
e79877
17.1%
f77062
16.5%
a74960
16.1%
c73751
15.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII1231072
100.0%

Most frequent character per block

ValueCountFrequency (%)
b81147
 
6.6%
d79935
 
6.5%
e79877
 
6.5%
979333
 
6.4%
478484
 
6.4%
078114
 
6.3%
177314
 
6.3%
f77062
 
6.3%
776657
 
6.2%
276536
 
6.2%
Other values (6)446613
36.3%

Segment
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
Consumer
20019 
Corporate
11471 
Home Office
6981 

Length

Max length11
Median length8
Mean length8.842556731
Min length8

Characters and Unicode

Total characters340182
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCorporate
2nd rowCorporate
3rd rowCorporate
4th rowHome Office
5th rowConsumer
ValueCountFrequency (%)
Consumer20019
52.0%
Corporate11471
29.8%
Home Office6981
 
18.1%
2021-10-15T09:53:01.423465image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-10-15T09:53:01.477682image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
consumer20019
44.0%
corporate11471
25.2%
office6981
 
15.4%
home6981
 
15.4%

Most occurring characters

ValueCountFrequency (%)
o49942
14.7%
e45452
13.4%
r42961
12.6%
C31490
9.3%
m27000
7.9%
n20019
 
5.9%
s20019
 
5.9%
u20019
 
5.9%
f13962
 
4.1%
p11471
 
3.4%
Other values (7)57847
17.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter287749
84.6%
Uppercase Letter45452
 
13.4%
Space Separator6981
 
2.1%

Most frequent character per category

ValueCountFrequency (%)
o49942
17.4%
e45452
15.8%
r42961
14.9%
m27000
9.4%
n20019
7.0%
s20019
7.0%
u20019
7.0%
f13962
 
4.9%
p11471
 
4.0%
a11471
 
4.0%
Other values (3)25433
8.8%
ValueCountFrequency (%)
C31490
69.3%
H6981
 
15.4%
O6981
 
15.4%
ValueCountFrequency (%)
6981
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin333201
97.9%
Common6981
 
2.1%

Most frequent character per script

ValueCountFrequency (%)
o49942
15.0%
e45452
13.6%
r42961
12.9%
C31490
9.5%
m27000
8.1%
n20019
 
6.0%
s20019
 
6.0%
u20019
 
6.0%
f13962
 
4.2%
p11471
 
3.4%
Other values (6)50866
15.3%
ValueCountFrequency (%)
6981
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII340182
100.0%

Most frequent character per block

ValueCountFrequency (%)
o49942
14.7%
e45452
13.4%
r42961
12.6%
C31490
9.3%
m27000
7.9%
n20019
 
5.9%
s20019
 
5.9%
u20019
 
5.9%
f13962
 
4.1%
p11471
 
3.4%
Other values (7)57847
17.0%

City
Categorical

HIGH CARDINALITY

Distinct3475
Distinct (%)9.0%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
New York City
 
707
Los Angeles
 
580
Philadelphia
 
402
San Francisco
 
376
Manila
 
323
Other values (3470)
36083 

Length

Max length35
Median length8
Mean length8.424709521
Min length2

Characters and Unicode

Total characters324107
Distinct characters76
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique591 ?
Unique (%)1.5%

Sample

1st rowMurfreesboro
2nd rowOosterhout
3rd rowPhnom Penh
4th rowLima
5th rowLondon
ValueCountFrequency (%)
New York City707
 
1.8%
Los Angeles580
 
1.5%
Philadelphia402
 
1.0%
San Francisco376
 
1.0%
Manila323
 
0.8%
Santo Domingo321
 
0.8%
Seattle315
 
0.8%
Houston284
 
0.7%
Tegucigalpa263
 
0.7%
Lagos260
 
0.7%
Other values (3465)34640
90.0%
2021-10-15T09:53:01.661948image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
city1348
 
2.8%
san1271
 
2.7%
new737
 
1.5%
york734
 
1.5%
los682
 
1.4%
angeles584
 
1.2%
de464
 
1.0%
francisco408
 
0.9%
philadelphia402
 
0.8%
santo339
 
0.7%
Other values (3638)40989
85.5%

Most occurring characters

ValueCountFrequency (%)
a41415
 
12.8%
n24347
 
7.5%
e23915
 
7.4%
o22914
 
7.1%
i20255
 
6.2%
r17864
 
5.5%
l16110
 
5.0%
s12155
 
3.8%
t11937
 
3.7%
u11734
 
3.6%
Other values (66)121461
37.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter265795
82.0%
Uppercase Letter47596
 
14.7%
Space Separator9487
 
2.9%
Dash Punctuation955
 
0.3%
Other Punctuation266
 
0.1%
Open Punctuation3
 
< 0.1%
Close Punctuation3
 
< 0.1%
Final Punctuation2
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a41415
15.6%
n24347
 
9.2%
e23915
 
9.0%
o22914
 
8.6%
i20255
 
7.6%
r17864
 
6.7%
l16110
 
6.1%
s12155
 
4.6%
t11937
 
4.5%
u11734
 
4.4%
Other values (32)63149
23.8%
ValueCountFrequency (%)
S5624
11.8%
C5417
11.4%
M4539
 
9.5%
B3498
 
7.3%
L3223
 
6.8%
A3104
 
6.5%
P3022
 
6.3%
T1974
 
4.1%
D1917
 
4.0%
N1863
 
3.9%
Other values (17)13415
28.2%
ValueCountFrequency (%)
'262
98.5%
.4
 
1.5%
ValueCountFrequency (%)
9487
100.0%
ValueCountFrequency (%)
-955
100.0%
ValueCountFrequency (%)
2
100.0%
ValueCountFrequency (%)
(3
100.0%
ValueCountFrequency (%)
)3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin313391
96.7%
Common10716
 
3.3%

Most frequent character per script

ValueCountFrequency (%)
a41415
 
13.2%
n24347
 
7.8%
e23915
 
7.6%
o22914
 
7.3%
i20255
 
6.5%
r17864
 
5.7%
l16110
 
5.1%
s12155
 
3.9%
t11937
 
3.8%
u11734
 
3.7%
Other values (59)110745
35.3%
ValueCountFrequency (%)
9487
88.5%
-955
 
8.9%
'262
 
2.4%
.4
 
< 0.1%
(3
 
< 0.1%
)3
 
< 0.1%
2
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII322287
99.4%
None1818
 
0.6%
Punctuation2
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
a41415
 
12.9%
n24347
 
7.6%
e23915
 
7.4%
o22914
 
7.1%
i20255
 
6.3%
r17864
 
5.5%
l16110
 
5.0%
s12155
 
3.8%
t11937
 
3.7%
u11734
 
3.6%
Other values (48)119641
37.1%
ValueCountFrequency (%)
á480
26.4%
í386
21.2%
ó313
17.2%
é224
12.3%
ã194
10.7%
ú61
 
3.4%
ü43
 
2.4%
ç33
 
1.8%
ñ24
 
1.3%
â20
 
1.1%
Other values (7)40
 
2.2%
ValueCountFrequency (%)
2
100.0%

State
Categorical

HIGH CARDINALITY

Distinct1072
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
California
 
1518
England
 
1123
New York
 
866
Texas
 
759
Ile-de-France
 
725
Other values (1067)
33480 

Length

Max length36
Median length8
Mean length9.641470198
Min length3

Characters and Unicode

Total characters370917
Distinct characters84
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique91 ?
Unique (%)0.2%

Sample

1st rowTennessee
2nd rowNorth Brabant
3rd rowPhnom Penh
4th rowLima (city)
5th rowEngland
ValueCountFrequency (%)
California1518
 
3.9%
England1123
 
2.9%
New York866
 
2.3%
Texas759
 
2.0%
Ile-de-France725
 
1.9%
New South Wales600
 
1.6%
North Rhine-Westphalia543
 
1.4%
Queensland539
 
1.4%
San Salvador468
 
1.2%
National Capital440
 
1.1%
Other values (1062)30890
80.3%
2021-10-15T09:53:01.871766image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
california1617
 
3.2%
new1609
 
3.1%
england1123
 
2.2%
south912
 
1.8%
york866
 
1.7%
north864
 
1.7%
texas759
 
1.5%
ile-de-france725
 
1.4%
wales627
 
1.2%
san596
 
1.2%
Other values (1165)41481
81.1%

Most occurring characters

ValueCountFrequency (%)
a54764
14.8%
n29578
 
8.0%
i25087
 
6.8%
e23322
 
6.3%
r21272
 
5.7%
o21243
 
5.7%
l17914
 
4.8%
t15922
 
4.3%
s14597
 
3.9%
12708
 
3.4%
Other values (74)134510
36.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter299227
80.7%
Uppercase Letter53874
 
14.5%
Space Separator12708
 
3.4%
Dash Punctuation4340
 
1.2%
Other Punctuation610
 
0.2%
Open Punctuation79
 
< 0.1%
Close Punctuation79
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a54764
18.3%
n29578
9.9%
i25087
 
8.4%
e23322
 
7.8%
r21272
 
7.1%
o21243
 
7.1%
l17914
 
6.0%
t15922
 
5.3%
s14597
 
4.9%
u11030
 
3.7%
Other values (40)64498
21.6%
ValueCountFrequency (%)
C5699
 
10.6%
S5187
 
9.6%
A4167
 
7.7%
N3758
 
7.0%
M3079
 
5.7%
P3007
 
5.6%
B2684
 
5.0%
T2357
 
4.4%
W2286
 
4.2%
G1943
 
3.6%
Other values (18)19707
36.6%
ValueCountFrequency (%)
'571
93.6%
.39
 
6.4%
ValueCountFrequency (%)
12708
100.0%
ValueCountFrequency (%)
(79
100.0%
ValueCountFrequency (%)
)79
100.0%
ValueCountFrequency (%)
-4340
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin353101
95.2%
Common17816
 
4.8%

Most frequent character per script

ValueCountFrequency (%)
a54764
15.5%
n29578
 
8.4%
i25087
 
7.1%
e23322
 
6.6%
r21272
 
6.0%
o21243
 
6.0%
l17914
 
5.1%
t15922
 
4.5%
s14597
 
4.1%
u11030
 
3.1%
Other values (68)118372
33.5%
ValueCountFrequency (%)
12708
71.3%
-4340
 
24.4%
'571
 
3.2%
(79
 
0.4%
)79
 
0.4%
.39
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII367458
99.1%
None3377
 
0.9%
Latin Ext Additional82
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
a54764
14.9%
n29578
 
8.0%
i25087
 
6.8%
e23322
 
6.3%
r21272
 
5.8%
o21243
 
5.8%
l17914
 
4.9%
t15922
 
4.3%
s14597
 
4.0%
12708
 
3.5%
Other values (48)131051
35.7%
ValueCountFrequency (%)
30
36.6%
30
36.6%
11
 
13.4%
11
 
13.4%
ValueCountFrequency (%)
é681
20.2%
á669
19.8%
í549
16.3%
ô505
15.0%
ã337
10.0%
ó218
 
6.5%
ü204
 
6.0%
è50
 
1.5%
à39
 
1.2%
ä26
 
0.8%
Other values (12)99
 
2.9%

Country
Categorical

HIGH CARDINALITY

Distinct147
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
United States
7556 
Australia
 
2137
France
 
2114
Mexico
 
2012
Germany
 
1570
Other values (142)
23082 

Length

Max length32
Median length8
Mean length8.849600998
Min length4

Characters and Unicode

Total characters340453
Distinct characters54
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowUnited States
2nd rowNetherlands
3rd rowCambodia
4th rowPeru
5th rowUnited Kingdom
ValueCountFrequency (%)
United States7556
19.6%
Australia2137
 
5.6%
France2114
 
5.5%
Mexico2012
 
5.2%
Germany1570
 
4.1%
China1400
 
3.6%
United Kingdom1220
 
3.2%
Brazil1208
 
3.1%
India1154
 
3.0%
Indonesia1057
 
2.7%
Other values (137)17043
44.3%
2021-10-15T09:53:02.069904image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united8787
 
17.2%
states7556
 
14.8%
australia2137
 
4.2%
france2114
 
4.1%
mexico2012
 
3.9%
germany1570
 
3.1%
china1400
 
2.7%
kingdom1220
 
2.4%
brazil1208
 
2.4%
india1154
 
2.3%
Other values (154)22018
43.0%

Most occurring characters

ValueCountFrequency (%)
a41325
 
12.1%
e32547
 
9.6%
t30534
 
9.0%
i30506
 
9.0%
n27638
 
8.1%
d16001
 
4.7%
r15216
 
4.5%
s13828
 
4.1%
12705
 
3.7%
o10881
 
3.2%
Other values (44)109272
32.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter276908
81.3%
Uppercase Letter50547
 
14.8%
Space Separator12705
 
3.7%
Open Punctuation100
 
< 0.1%
Close Punctuation100
 
< 0.1%
Other Punctuation85
 
< 0.1%
Dash Punctuation8
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
a41325
14.9%
e32547
11.8%
t30534
11.0%
i30506
11.0%
n27638
10.0%
d16001
 
5.8%
r15216
 
5.5%
s13828
 
5.0%
o10881
 
3.9%
l10138
 
3.7%
Other values (16)48294
17.4%
ValueCountFrequency (%)
S10051
19.9%
U9143
18.1%
I3997
 
7.9%
A3616
 
7.2%
C3174
 
6.3%
M2806
 
5.6%
F2154
 
4.3%
G2125
 
4.2%
N2031
 
4.0%
B1762
 
3.5%
Other values (13)9688
19.2%
ValueCountFrequency (%)
12705
100.0%
ValueCountFrequency (%)
(100
100.0%
ValueCountFrequency (%)
)100
100.0%
ValueCountFrequency (%)
'85
100.0%
ValueCountFrequency (%)
-8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin327455
96.2%
Common12998
 
3.8%

Most frequent character per script

ValueCountFrequency (%)
a41325
12.6%
e32547
 
9.9%
t30534
 
9.3%
i30506
 
9.3%
n27638
 
8.4%
d16001
 
4.9%
r15216
 
4.6%
s13828
 
4.2%
o10881
 
3.3%
l10138
 
3.1%
Other values (39)98841
30.2%
ValueCountFrequency (%)
12705
97.7%
(100
 
0.8%
)100
 
0.8%
'85
 
0.7%
-8
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII340453
100.0%

Most frequent character per block

ValueCountFrequency (%)
a41325
 
12.1%
e32547
 
9.6%
t30534
 
9.0%
i30506
 
9.0%
n27638
 
8.1%
d16001
 
4.7%
r15216
 
4.5%
s13828
 
4.1%
12705
 
3.7%
o10881
 
3.2%
Other values (44)109272
32.1%

Postal Code
Real number (ℝ≥0)

MISSING

Distinct609
Distinct (%)8.1%
Missing30915
Missing (%)80.4%
Infinite0
Infinite (%)0.0%
Mean55150.06392
Minimum1040
Maximum99301
Zeros0
Zeros (%)0.0%
Memory size300.7 KiB
2021-10-15T09:53:02.163050image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum1040
5-th percentile10009
Q123223
median56301
Q390008
95-th percentile97567
Maximum99301
Range98261
Interquartile range (IQR)66785

Descriptive statistics

Standard deviation32021.07257
Coefficient of variation (CV)0.5806171433
Kurtosis-1.494324426
Mean55150.06392
Median Absolute Deviation (MAD)33703
Skewness-0.1265501212
Sum416713883
Variance1025349088
MonotocityNot monotonic
2021-10-15T09:53:02.266648image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10035202
 
0.5%
10024186
 
0.5%
10009172
 
0.4%
94122154
 
0.4%
10011147
 
0.4%
19134127
 
0.3%
98105126
 
0.3%
90049117
 
0.3%
94110117
 
0.3%
98103116
 
0.3%
Other values (599)6092
 
15.8%
(Missing)30915
80.4%
ValueCountFrequency (%)
10401
 
< 0.1%
14534
 
< 0.1%
17522
 
< 0.1%
18104
 
< 0.1%
184127
0.1%
ValueCountFrequency (%)
993013
< 0.1%
992075
< 0.1%
986614
< 0.1%
986323
< 0.1%
985025
< 0.1%

Market
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
APAC
8242 
LATAM
7734 
US
7556 
EU
7469 
EMEA
3735 
Other values (2)
3735 

Length

Max length6
Median length4
Mean length3.614098932
Min length2

Characters and Unicode

Total characters139038
Distinct characters16
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUS
2nd rowEU
3rd rowAPAC
4th rowLATAM
5th rowEU
ValueCountFrequency (%)
APAC8242
21.4%
LATAM7734
20.1%
US7556
19.6%
EU7469
19.4%
EMEA3735
9.7%
Africa3450
9.0%
Canada285
 
0.7%
2021-10-15T09:53:02.442624image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-10-15T09:53:02.506674image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
apac8242
21.4%
latam7734
20.1%
us7556
19.6%
eu7469
19.4%
emea3735
9.7%
africa3450
9.0%
canada285
 
0.7%

Most occurring characters

ValueCountFrequency (%)
A39137
28.1%
U15025
 
10.8%
E14939
 
10.7%
M11469
 
8.2%
C8527
 
6.1%
P8242
 
5.9%
L7734
 
5.6%
T7734
 
5.6%
S7556
 
5.4%
a4305
 
3.1%
Other values (6)14370
 
10.3%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter120363
86.6%
Lowercase Letter18675
 
13.4%

Most frequent character per category

ValueCountFrequency (%)
A39137
32.5%
U15025
 
12.5%
E14939
 
12.4%
M11469
 
9.5%
C8527
 
7.1%
P8242
 
6.8%
L7734
 
6.4%
T7734
 
6.4%
S7556
 
6.3%
ValueCountFrequency (%)
a4305
23.1%
f3450
18.5%
r3450
18.5%
i3450
18.5%
c3450
18.5%
n285
 
1.5%
d285
 
1.5%

Most occurring scripts

ValueCountFrequency (%)
Latin139038
100.0%

Most frequent character per script

ValueCountFrequency (%)
A39137
28.1%
U15025
 
10.8%
E14939
 
10.7%
M11469
 
8.2%
C8527
 
6.1%
P8242
 
5.9%
L7734
 
5.6%
T7734
 
5.6%
S7556
 
5.4%
a4305
 
3.1%
Other values (6)14370
 
10.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII139038
100.0%

Most frequent character per block

ValueCountFrequency (%)
A39137
28.1%
U15025
 
10.8%
E14939
 
10.7%
M11469
 
8.2%
C8527
 
6.1%
P8242
 
5.9%
L7734
 
5.6%
T7734
 
5.6%
S7556
 
5.4%
a4305
 
3.1%
Other values (6)14370
 
10.3%

Region
Categorical

HIGH CORRELATION

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
Central
8318 
South
5012 
EMEA
3735 
North
3606 
Africa
3450 
Other values (8)
14350 

Length

Max length14
Median length6
Mean length6.634165995
Min length4

Characters and Unicode

Total characters255223
Distinct characters24
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowSouth
2nd rowCentral
3rd rowSoutheast Asia
4th rowSouth
5th rowNorth
ValueCountFrequency (%)
Central8318
21.6%
South5012
13.0%
EMEA3735
9.7%
North3606
9.4%
Africa3450
9.0%
Oceania2628
 
6.8%
West2412
 
6.3%
Southeast Asia2334
 
6.1%
East2148
 
5.6%
North Asia1741
 
4.5%
Other values (3)3087
 
8.0%
2021-10-15T09:53:02.679806image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
central9857
22.4%
asia5614
12.7%
north5347
12.1%
south5012
11.4%
emea3735
 
8.5%
africa3450
 
7.8%
oceania2628
 
6.0%
west2412
 
5.5%
southeast2334
 
5.3%
east2148
 
4.9%
Other values (2)1548
 
3.5%

Most occurring characters

ValueCountFrequency (%)
a32040
12.6%
t29444
 
11.5%
r19917
 
7.8%
e18494
 
7.2%
n14033
 
5.5%
i12955
 
5.1%
A12799
 
5.0%
o12693
 
5.0%
h12693
 
5.0%
s12508
 
4.9%
Other values (14)77647
30.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter194319
76.1%
Uppercase Letter55290
 
21.7%
Space Separator5614
 
2.2%

Most frequent character per category

ValueCountFrequency (%)
a32040
16.5%
t29444
15.2%
r19917
10.2%
e18494
9.5%
n14033
7.2%
i12955
6.7%
o12693
 
6.5%
h12693
 
6.5%
s12508
 
6.4%
l9857
 
5.1%
Other values (5)19685
10.1%
ValueCountFrequency (%)
A12799
23.1%
C11405
20.6%
E9618
17.4%
S7346
13.3%
N5347
9.7%
M3735
 
6.8%
O2628
 
4.8%
W2412
 
4.4%
ValueCountFrequency (%)
5614
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin249609
97.8%
Common5614
 
2.2%

Most frequent character per script

ValueCountFrequency (%)
a32040
12.8%
t29444
11.8%
r19917
 
8.0%
e18494
 
7.4%
n14033
 
5.6%
i12955
 
5.2%
A12799
 
5.1%
o12693
 
5.1%
h12693
 
5.1%
s12508
 
5.0%
Other values (13)72033
28.9%
ValueCountFrequency (%)
5614
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII255223
100.0%

Most frequent character per block

ValueCountFrequency (%)
a32040
12.6%
t29444
 
11.5%
r19917
 
7.8%
e18494
 
7.2%
n14033
 
5.5%
i12955
 
5.1%
A12799
 
5.0%
o12693
 
5.0%
h12693
 
5.0%
s12508
 
4.9%
Other values (14)77647
30.4%

Product ID
Categorical

HIGH CARDINALITY

Distinct9815
Distinct (%)25.5%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
OFF-AR-10003651
 
27
OFF-AR-10003829
 
25
OFF-BI-10003708
 
25
OFF-BI-10004632
 
22
OFF-BI-10002799
 
21
Other values (9810)
38351 

Length

Max length16
Median length15
Mean length15.19422422
Min length15

Characters and Unicode

Total characters584537
Distinct characters35
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1729 ?
Unique (%)4.5%

Sample

1st rowTEC-AC-10004227
2nd rowOFF-LA-10003699
3rd rowFUR-BO-10000112
4th rowFUR-CH-10004338
5th rowOFF-ST-10001646
ValueCountFrequency (%)
OFF-AR-1000365127
 
0.1%
OFF-AR-1000382925
 
0.1%
OFF-BI-1000370825
 
0.1%
OFF-BI-1000463222
 
0.1%
OFF-BI-1000279921
 
0.1%
OFF-BI-1000054221
 
0.1%
FUR-CH-1000335420
 
0.1%
OFF-BI-1000414020
 
0.1%
OFF-BI-1000365019
 
< 0.1%
OFF-BI-1000257019
 
< 0.1%
Other values (9805)38252
99.4%
2021-10-15T09:53:02.885485image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
tec-hp65
 
0.2%
off-ar-1000365127
 
0.1%
off-bi-1000370825
 
0.1%
off-ar-1000382925
 
0.1%
off-bi-1000463222
 
0.1%
off-bi-1000279921
 
0.1%
off-bi-1000054221
 
0.1%
fur-ch-1000335420
 
0.1%
off-bi-1000414020
 
0.1%
off-bi-1000365019
 
< 0.1%
Other values (9806)38271
99.3%

Most occurring characters

ValueCountFrequency (%)
0134600
23.0%
-76942
13.2%
F58466
10.0%
157892
9.9%
O27809
 
4.8%
219214
 
3.3%
319162
 
3.3%
418964
 
3.2%
A15211
 
2.6%
C14404
 
2.5%
Other values (25)141873
24.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number307768
52.7%
Uppercase Letter199762
34.2%
Dash Punctuation76942
 
13.2%
Space Separator65
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
F58466
29.3%
O27809
13.9%
A15211
 
7.6%
C14404
 
7.2%
T12026
 
6.0%
E11519
 
5.8%
U11185
 
5.6%
R11037
 
5.5%
S6338
 
3.2%
B6164
 
3.1%
Other values (13)25603
12.8%
ValueCountFrequency (%)
0134600
43.7%
157892
18.8%
219214
 
6.2%
319162
 
6.2%
418964
 
6.2%
512164
 
4.0%
711675
 
3.8%
911542
 
3.8%
811500
 
3.7%
611055
 
3.6%
ValueCountFrequency (%)
-76942
100.0%
ValueCountFrequency (%)
65
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common384775
65.8%
Latin199762
34.2%

Most frequent character per script

ValueCountFrequency (%)
F58466
29.3%
O27809
13.9%
A15211
 
7.6%
C14404
 
7.2%
T12026
 
6.0%
E11519
 
5.8%
U11185
 
5.6%
R11037
 
5.5%
S6338
 
3.2%
B6164
 
3.1%
Other values (13)25603
12.8%
ValueCountFrequency (%)
0134600
35.0%
-76942
20.0%
157892
15.0%
219214
 
5.0%
319162
 
5.0%
418964
 
4.9%
512164
 
3.2%
711675
 
3.0%
911542
 
3.0%
811500
 
3.0%
Other values (2)11120
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII584537
100.0%

Most frequent character per block

ValueCountFrequency (%)
0134600
23.0%
-76942
13.2%
F58466
10.0%
157892
9.9%
O27809
 
4.8%
219214
 
3.3%
319162
 
3.3%
418964
 
3.2%
A15211
 
2.6%
C14404
 
2.5%
Other values (25)141873
24.3%

Category
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
Office Supplies
23436 
Technology
7593 
Furniture
7442 

Length

Max length15
Median length15
Mean length12.85248629
Min length9

Characters and Unicode

Total characters494448
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTechnology
2nd rowOffice Supplies
3rd rowFurniture
4th rowFurniture
5th rowOffice Supplies
ValueCountFrequency (%)
Office Supplies23436
60.9%
Technology7593
 
19.7%
Furniture7442
 
19.3%
2021-10-15T09:53:03.052157image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-10-15T09:53:03.104816image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
office23436
37.9%
supplies23436
37.9%
technology7593
 
12.3%
furniture7442
 
12.0%

Most occurring characters

ValueCountFrequency (%)
e61907
12.5%
i54314
11.0%
f46872
9.5%
p46872
9.5%
u38320
 
7.8%
c31029
 
6.3%
l31029
 
6.3%
O23436
 
4.7%
23436
 
4.7%
S23436
 
4.7%
Other values (10)113797
23.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter409105
82.7%
Uppercase Letter61907
 
12.5%
Space Separator23436
 
4.7%

Most frequent character per category

ValueCountFrequency (%)
e61907
15.1%
i54314
13.3%
f46872
11.5%
p46872
11.5%
u38320
9.4%
c31029
7.6%
l31029
7.6%
s23436
 
5.7%
o15186
 
3.7%
n15035
 
3.7%
Other values (5)45105
11.0%
ValueCountFrequency (%)
O23436
37.9%
S23436
37.9%
T7593
 
12.3%
F7442
 
12.0%
ValueCountFrequency (%)
23436
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin471012
95.3%
Common23436
 
4.7%

Most frequent character per script

ValueCountFrequency (%)
e61907
13.1%
i54314
11.5%
f46872
10.0%
p46872
10.0%
u38320
8.1%
c31029
 
6.6%
l31029
 
6.6%
O23436
 
5.0%
S23436
 
5.0%
s23436
 
5.0%
Other values (9)90361
19.2%
ValueCountFrequency (%)
23436
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII494448
100.0%

Most frequent character per block

ValueCountFrequency (%)
e61907
12.5%
i54314
11.0%
f46872
9.5%
p46872
9.5%
u38320
 
7.8%
c31029
 
6.3%
l31029
 
6.3%
O23436
 
4.7%
23436
 
4.7%
S23436
 
4.7%
Other values (10)113797
23.0%

Sub-Category
Categorical

HIGH CORRELATION

Distinct17
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
Binders
4614 
Storage
3802 
Art
3639 
Paper
2663 
Chairs
2576 
Other values (12)
21177 

Length

Max length11
Median length7
Mean length7.23516415
Min length3

Characters and Unicode

Total characters278344
Distinct characters28
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAccessories
2nd rowLabels
3rd rowBookcases
4th rowChairs
5th rowStorage
ValueCountFrequency (%)
Binders4614
12.0%
Storage3802
 
9.9%
Art3639
 
9.5%
Paper2663
 
6.9%
Chairs2576
 
6.7%
Phones2547
 
6.6%
Furnishings2372
 
6.2%
Accessories2344
 
6.1%
Labels1950
 
5.1%
Bookcases1835
 
4.8%
Other values (7)10129
26.3%
2021-10-15T09:53:03.267606image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
binders4614
12.0%
storage3802
 
9.9%
art3639
 
9.5%
paper2663
 
6.9%
chairs2576
 
6.7%
phones2547
 
6.6%
furnishings2372
 
6.2%
accessories2344
 
6.1%
labels1950
 
5.1%
bookcases1835
 
4.8%
Other values (7)10129
26.3%

Most occurring characters

ValueCountFrequency (%)
s39070
14.0%
e35865
12.9%
r25432
 
9.1%
i20111
 
7.2%
n17947
 
6.4%
a17698
 
6.4%
o15806
 
5.7%
p12368
 
4.4%
t9249
 
3.3%
c8928
 
3.2%
Other values (18)75870
27.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter239873
86.2%
Uppercase Letter38471
 
13.8%

Most frequent character per category

ValueCountFrequency (%)
s39070
16.3%
e35865
15.0%
r25432
10.6%
i20111
8.4%
n17947
7.5%
a17698
7.4%
o15806
6.6%
p12368
 
5.2%
t9249
 
3.9%
c8928
 
3.7%
Other values (8)37399
15.6%
ValueCountFrequency (%)
A7300
19.0%
B6449
16.8%
S5616
14.6%
P5210
13.5%
C4190
10.9%
F4180
10.9%
L1950
 
5.1%
E1829
 
4.8%
M1088
 
2.8%
T659
 
1.7%

Most occurring scripts

ValueCountFrequency (%)
Latin278344
100.0%

Most frequent character per script

ValueCountFrequency (%)
s39070
14.0%
e35865
12.9%
r25432
 
9.1%
i20111
 
7.2%
n17947
 
6.4%
a17698
 
6.4%
o15806
 
5.7%
p12368
 
4.4%
t9249
 
3.3%
c8928
 
3.2%
Other values (18)75870
27.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII278344
100.0%

Most frequent character per block

ValueCountFrequency (%)
s39070
14.0%
e35865
12.9%
r25432
 
9.1%
i20111
 
7.2%
n17947
 
6.4%
a17698
 
6.4%
o15806
 
5.7%
p12368
 
4.4%
t9249
 
3.3%
c8928
 
3.2%
Other values (18)75870
27.3%

Product Name
Categorical

HIGH CARDINALITY

Distinct3750
Distinct (%)9.7%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
Staples
 
165
Cardinal Index Tab, Clear
 
72
Ibico Index Tab, Clear
 
67
Eldon File Cart, Single Width
 
66
Smead File Cart, Single Width
 
63
Other values (3745)
38038 

Length

Max length127
Median length29
Mean length30.90800863
Min length5

Characters and Unicode

Total characters1189062
Distinct characters85
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique181 ?
Unique (%)0.5%

Sample

1st rowSanDisk Ultra 16 GB MicroSDHC Class 10 Memory Card
2nd rowSmead File Folder Labels, Adjustable
3rd rowDania Corner Shelving, Pine
4th rowHon Bag Chairs, Red
5th rowFellowes Box, Wire Frame
ValueCountFrequency (%)
Staples165
 
0.4%
Cardinal Index Tab, Clear72
 
0.2%
Ibico Index Tab, Clear67
 
0.2%
Eldon File Cart, Single Width66
 
0.2%
Smead File Cart, Single Width63
 
0.2%
Sanford Pencil Sharpener, Water Color60
 
0.2%
Acco Index Tab, Clear59
 
0.2%
Rogers File Cart, Single Width57
 
0.1%
Tenex File Cart, Single Width53
 
0.1%
Stanley Pencil Sharpener, Water Color50
 
0.1%
Other values (3740)37759
98.1%
2021-10-15T09:53:03.487385image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
labels1783
 
1.0%
recycled1729
 
1.0%
with1672
 
1.0%
set1599
 
0.9%
color1580
 
0.9%
blue1568
 
0.9%
durable1561
 
0.9%
black1528
 
0.9%
avery1464
 
0.8%
clear1414
 
0.8%
Other values (2797)158116
90.9%

Most occurring characters

ValueCountFrequency (%)
135239
 
11.4%
e116177
 
9.8%
a70919
 
6.0%
r68645
 
5.8%
o66508
 
5.6%
l60120
 
5.1%
i59436
 
5.0%
n50986
 
4.3%
t47003
 
4.0%
s45347
 
3.8%
Other values (75)468682
39.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter814433
68.5%
Uppercase Letter176764
 
14.9%
Space Separator135585
 
11.4%
Other Punctuation37615
 
3.2%
Decimal Number19537
 
1.6%
Dash Punctuation4945
 
0.4%
Open Punctuation47
 
< 0.1%
Close Punctuation47
 
< 0.1%
Final Punctuation46
 
< 0.1%
Math Symbol26
 
< 0.1%
Other values (2)17
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e116177
14.3%
a70919
 
8.7%
r68645
 
8.4%
o66508
 
8.2%
l60120
 
7.4%
i59436
 
7.3%
n50986
 
6.3%
t47003
 
5.8%
s45347
 
5.6%
c32468
 
4.0%
Other values (18)196824
24.2%
ValueCountFrequency (%)
S24995
14.1%
C20660
11.7%
B17076
 
9.7%
P13620
 
7.7%
E9728
 
5.5%
A9333
 
5.3%
F9227
 
5.2%
M8017
 
4.5%
R7798
 
4.4%
T7576
 
4.3%
Other values (16)48734
27.6%
ValueCountFrequency (%)
,33277
88.5%
/1197
 
3.2%
&1070
 
2.8%
"986
 
2.6%
.778
 
2.1%
'181
 
0.5%
#70
 
0.2%
%35
 
0.1%
*7
 
< 0.1%
!7
 
< 0.1%
Other values (2)7
 
< 0.1%
ValueCountFrequency (%)
14091
20.9%
03939
20.2%
52343
12.0%
22082
10.7%
32012
10.3%
81389
 
7.1%
41352
 
6.9%
9929
 
4.8%
6726
 
3.7%
7674
 
3.4%
ValueCountFrequency (%)
135239
99.7%
 346
 
0.3%
ValueCountFrequency (%)
-4945
100.0%
ValueCountFrequency (%)
(47
100.0%
ValueCountFrequency (%)
)47
100.0%
ValueCountFrequency (%)
¾3
100.0%
ValueCountFrequency (%)
46
100.0%
ValueCountFrequency (%)
+26
100.0%
ValueCountFrequency (%)
14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin991197
83.4%
Common197865
 
16.6%

Most frequent character per script

ValueCountFrequency (%)
e116177
 
11.7%
a70919
 
7.2%
r68645
 
6.9%
o66508
 
6.7%
l60120
 
6.1%
i59436
 
6.0%
n50986
 
5.1%
t47003
 
4.7%
s45347
 
4.6%
c32468
 
3.3%
Other values (44)373588
37.7%
ValueCountFrequency (%)
135239
68.3%
,33277
 
16.8%
-4945
 
2.5%
14091
 
2.1%
03939
 
2.0%
52343
 
1.2%
22082
 
1.1%
32012
 
1.0%
81389
 
0.7%
41352
 
0.7%
Other values (21)7196
 
3.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII1188637
> 99.9%
None365
 
< 0.1%
Punctuation60
 
< 0.1%

Most frequent character per block

ValueCountFrequency (%)
135239
 
11.4%
e116177
 
9.8%
a70919
 
6.0%
r68645
 
5.8%
o66508
 
5.6%
l60120
 
5.1%
i59436
 
5.0%
n50986
 
4.3%
t47003
 
4.0%
s45347
 
3.8%
Other values (69)468257
39.4%
ValueCountFrequency (%)
 346
94.8%
é14
 
3.8%
¾3
 
0.8%
à2
 
0.5%
ValueCountFrequency (%)
46
76.7%
14
 
23.3%

Sales
Real number (ℝ≥0)

Distinct22491
Distinct (%)58.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean246.182845
Minimum0.444
Maximum22638.48
Zeros0
Zeros (%)0.0%
Memory size300.7 KiB
2021-10-15T09:53:03.590723image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum0.444
5-th percentile8.72
Q130.69
median85.44
Q3250.8972
95-th percentile1008.10134
Maximum22638.48
Range22638.036
Interquartile range (IQR)220.2072

Descriptive statistics

Standard deviation493.7178054
Coefficient of variation (CV)2.005492322
Kurtosis211.2107054
Mean246.182845
Median Absolute Deviation (MAD)67.26
Skewness8.949622475
Sum9470900.23
Variance243757.2714
MonotocityNot monotonic
2021-10-15T09:53:03.683874image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12.9638
 
0.1%
19.4431
 
0.1%
2429
 
0.1%
10.36828
 
0.1%
32.426
 
0.1%
25.9226
 
0.1%
17.5225
 
0.1%
15.55224
 
0.1%
27.9623
 
0.1%
12.3620
 
0.1%
Other values (22481)38201
99.3%
ValueCountFrequency (%)
0.4441
< 0.1%
0.5561
< 0.1%
0.8361
< 0.1%
0.9841
< 0.1%
0.991
< 0.1%
ValueCountFrequency (%)
22638.481
< 0.1%
17499.951
< 0.1%
13999.961
< 0.1%
11199.9681
< 0.1%
10499.971
< 0.1%

Quantity
Real number (ℝ≥0)

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.475630995
Minimum1
Maximum14
Zeros0
Zeros (%)0.0%
Memory size300.7 KiB
2021-10-15T09:53:03.764642image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile8
Maximum14
Range13
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.280458594
Coefficient of variation (CV)0.6561279368
Kurtosis2.331191556
Mean3.475630995
Median Absolute Deviation (MAD)1
Skewness1.371360107
Sum133711
Variance5.200491397
MonotocityNot monotonic
2021-10-15T09:53:03.839916image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=14)
ValueCountFrequency (%)
29598
24.9%
37260
18.9%
16706
17.4%
44797
12.5%
53652
 
9.5%
62243
 
5.8%
71806
 
4.7%
81013
 
2.6%
9738
 
1.9%
10198
 
0.5%
Other values (4)460
 
1.2%
ValueCountFrequency (%)
16706
17.4%
29598
24.9%
37260
18.9%
44797
12.5%
53652
 
9.5%
ValueCountFrequency (%)
14147
0.4%
1362
 
0.2%
12130
0.3%
11121
0.3%
10198
0.5%

Discount
Real number (ℝ≥0)

ZEROS

Distinct29
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.1431364924
Minimum0
Maximum0.85
Zeros21767
Zeros (%)56.6%
Memory size300.7 KiB
2021-10-15T09:53:03.924267image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30.2
95-th percentile0.6
Maximum0.85
Range0.85
Interquartile range (IQR)0.2

Descriptive statistics

Standard deviation0.2124342108
Coefficient of variation (CV)1.484137323
Kurtosis0.7146596297
Mean0.1431364924
Median Absolute Deviation (MAD)0
Skewness1.386732947
Sum5506.604
Variance0.0451282939
MonotocityNot monotonic
2021-10-15T09:53:04.011258image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%)
021767
56.6%
0.23781
 
9.8%
0.13057
 
7.9%
0.42382
 
6.2%
0.61509
 
3.9%
0.71343
 
3.5%
0.51217
 
3.2%
0.17558
 
1.5%
0.47544
 
1.4%
0.15333
 
0.9%
Other values (19)1980
 
5.1%
ValueCountFrequency (%)
021767
56.6%
0.002316
 
0.8%
0.07104
 
0.3%
0.13057
 
7.9%
0.15333
 
0.9%
ValueCountFrequency (%)
0.851
 
< 0.1%
0.8242
 
0.6%
0.71343
3.5%
0.6514
 
< 0.1%
0.60214
 
< 0.1%

Profit
Real number (ℝ)

ZEROS

Distinct22674
Distinct (%)58.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.81932709
Minimum-4088.376
Maximum8399.976
Zeros510
Zeros (%)1.3%
Memory size300.7 KiB
2021-10-15T09:53:04.111278image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum-4088.376
5-th percentile-84.765
Q10
median9.27
Q336.97965
95-th percentile211.887
Maximum8399.976
Range12488.352
Interquartile range (IQR)36.97965

Descriptive statistics

Standard deviation177.1409931
Coefficient of variation (CV)6.146604069
Kurtosis301.2305522
Mean28.81932709
Median Absolute Deviation (MAD)16.05
Skewness6.426160608
Sum1108708.332
Variance31378.93145
MonotocityNot monotonic
2021-10-15T09:53:04.221360image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0510
 
1.3%
7.9247
 
0.1%
3.9644
 
0.1%
4.3244
 
0.1%
5.2842
 
0.1%
2.9741
 
0.1%
941
 
0.1%
2.6441
 
0.1%
2.8839
 
0.1%
1.2637
 
0.1%
Other values (22664)37585
97.7%
ValueCountFrequency (%)
-4088.3761
< 0.1%
-3839.99041
< 0.1%
-3701.89281
< 0.1%
-3399.981
< 0.1%
-3009.4351
< 0.1%
ValueCountFrequency (%)
8399.9761
< 0.1%
6719.98081
< 0.1%
5039.98561
< 0.1%
4946.371
< 0.1%
4630.47551
< 0.1%

Shipping Cost
Real number (ℝ≥0)

Distinct14135
Distinct (%)36.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.30097417
Minimum0.002
Maximum933.57
Zeros0
Zeros (%)0.0%
Memory size300.7 KiB
2021-10-15T09:53:04.332350image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum0.002
5-th percentile0.6
Q12.6
median7.82
Q324.43
95-th percentile111.19
Maximum933.57
Range933.568
Interquartile range (IQR)21.83

Descriptive statistics

Standard deviation57.31690842
Coefficient of variation (CV)2.179269408
Kurtosis50.44812565
Mean26.30097417
Median Absolute Deviation (MAD)6.44
Skewness5.906416938
Sum1011824.777
Variance3285.22799
MonotocityNot monotonic
2021-10-15T09:53:04.435352image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.3554
 
0.1%
0.6452
 
0.1%
0.8649
 
0.1%
0.7148
 
0.1%
0.7947
 
0.1%
1.7545
 
0.1%
1.2645
 
0.1%
0.8544
 
0.1%
0.6744
 
0.1%
0.9843
 
0.1%
Other values (14125)38000
98.8%
ValueCountFrequency (%)
0.0021
 
< 0.1%
0.0031
 
< 0.1%
0.015
< 0.1%
0.0191
 
< 0.1%
0.023
< 0.1%
ValueCountFrequency (%)
933.571
< 0.1%
915.491
< 0.1%
910.161
< 0.1%
897.351
< 0.1%
867.691
< 0.1%

Order Priority
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size300.7 KiB
Medium
22087 
High
11617 
Critical
2927 
Low
 
1840

Length

Max length8
Median length6
Mean length5.404746432
Min length3

Characters and Unicode

Total characters207926
Distinct characters18
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMedium
2nd rowLow
3rd rowMedium
4th rowMedium
5th rowMedium
ValueCountFrequency (%)
Medium22087
57.4%
High11617
30.2%
Critical2927
 
7.6%
Low1840
 
4.8%
2021-10-15T09:53:04.610304image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-10-15T09:53:04.661280image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
medium22087
57.4%
high11617
30.2%
critical2927
 
7.6%
low1840
 
4.8%

Most occurring characters

ValueCountFrequency (%)
i39558
19.0%
M22087
10.6%
e22087
10.6%
d22087
10.6%
u22087
10.6%
m22087
10.6%
H11617
 
5.6%
g11617
 
5.6%
h11617
 
5.6%
C2927
 
1.4%
Other values (8)20155
9.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter169455
81.5%
Uppercase Letter38471
 
18.5%

Most frequent character per category

ValueCountFrequency (%)
i39558
23.3%
e22087
13.0%
d22087
13.0%
u22087
13.0%
m22087
13.0%
g11617
 
6.9%
h11617
 
6.9%
r2927
 
1.7%
t2927
 
1.7%
c2927
 
1.7%
Other values (4)9534
 
5.6%
ValueCountFrequency (%)
M22087
57.4%
H11617
30.2%
C2927
 
7.6%
L1840
 
4.8%

Most occurring scripts

ValueCountFrequency (%)
Latin207926
100.0%

Most frequent character per script

ValueCountFrequency (%)
i39558
19.0%
M22087
10.6%
e22087
10.6%
d22087
10.6%
u22087
10.6%
m22087
10.6%
H11617
 
5.6%
g11617
 
5.6%
h11617
 
5.6%
C2927
 
1.4%
Other values (8)20155
9.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII207926
100.0%

Most frequent character per block

ValueCountFrequency (%)
i39558
19.0%
M22087
10.6%
e22087
10.6%
d22087
10.6%
u22087
10.6%
m22087
10.6%
H11617
 
5.6%
g11617
 
5.6%
h11617
 
5.6%
C2927
 
1.4%
Other values (8)20155
9.7%

Returned
Boolean

CONSTANT
HIGH CORRELATION
MISSING
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing36159
Missing (%)94.0%
Memory size75.3 KiB
True
 
2312
(Missing)
36159 
ValueCountFrequency (%)
True2312
 
6.0%
(Missing)36159
94.0%
2021-10-15T09:53:04.698090image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

ship_delay
Real number (ℝ≥0)

ZEROS

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.974084375
Minimum0
Maximum7
Zeros1987
Zeros (%)5.2%
Memory size300.7 KiB
2021-10-15T09:53:04.736299image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median4
Q35
95-th percentile7
Maximum7
Range7
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.737409026
Coefficient of variation (CV)0.4371847355
Kurtosis-0.2569962854
Mean3.974084375
Median Absolute Deviation (MAD)1
Skewness-0.4352293524
Sum152887
Variance3.018590125
MonotocityNot monotonic
2021-10-15T09:53:04.799581image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
410782
28.0%
58380
21.8%
25241
13.6%
64754
12.4%
33743
 
9.7%
72340
 
6.1%
01987
 
5.2%
11244
 
3.2%
ValueCountFrequency (%)
01987
 
5.2%
11244
 
3.2%
25241
13.6%
33743
 
9.7%
410782
28.0%
ValueCountFrequency (%)
72340
 
6.1%
64754
12.4%
58380
21.8%
410782
28.0%
33743
 
9.7%

Interactions

2021-10-15T09:52:53.116750image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:53.204249image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:53.295383image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:53.380455image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:53.473807image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:53.564922image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:53.650459image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:53.733343image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:53.825015image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:53.899838image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:53.983089image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:54.065975image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:54.148952image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:54.231707image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:54.324963image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:54.406791image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:54.491854image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:54.575176image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:54.658322image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:54.751681image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:54.836171image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:54.919671image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:55.002506image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:55.095528image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:55.189105image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:55.271887image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:55.365405image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:55.458490image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:55.551728image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:55.635321image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:55.728644image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:55.819584image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:55.904713image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:55.988229image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:56.079522image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:56.164768image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:56.247996image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:56.341304image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:56.434626image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:56.517834image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:56.611016image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:56.704403image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:56.797815image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:56.870395image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:56.963552image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:57.046387image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:57.129363image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:57.212107image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:57.305290image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:57.388103image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:57.479729image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:57.575078image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:57.666271image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:57.759731image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:57.845214image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-10-15T09:52:57.946637image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Correlations

2021-10-15T09:53:04.872064image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-15T09:53:04.986015image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-15T09:53:05.100005image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-15T09:53:05.224964image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-10-15T09:53:05.363433image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-10-15T09:52:58.224001image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-15T09:52:58.628523image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-10-15T09:52:58.901271image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-10-15T09:52:59.023273image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexOrder DateShip DateShip ModeCustomer IDCustomer NameSegmentCityStateCountryPostal CodeMarketRegionProduct IDCategorySub-CategoryProduct NameSalesQuantityDiscountProfitShipping CostOrder PriorityReturnedship_delay
0275072012-12-272012-12-31Standard ClassSP-20860a7d03c30d416fc5f7d695b495884fdd7CorporateMurfreesboroTennesseeUnited States37130.0USSouthTEC-AC-10004227TechnologyAccessoriesSanDisk Ultra 16 GB MicroSDHC Class 10 Memory Card72.744070.20-12.73026.720MediumNaN4
1355112014-12-252015-01-01Standard ClassJD-158951b2850c124acd1bc24237b4b5228b65eCorporateOosterhoutNorth BrabantNetherlandsNaNEUCentralOFF-LA-10003699Office SuppliesLabelsSmead File Folder Labels, Adjustable23.730070.50-21.00003.430LowNaN7
291722012-05-082012-05-11Second ClassAB-106006acab08bb2b385c8569adfd24730ee01CorporatePhnom PenhPhnom PenhCambodiaNaNAPACSoutheast AsiaFUR-BO-10000112FurnitureBookcasesDania Corner Shelving, Pine617.100050.00172.650036.380MediumNaN3
3313662011-06-302011-07-02Second ClassGH-144101528a0a296f3ecf500753855ea9a21a5Home OfficeLimaLima (city)PeruNaNLATAMSouthFUR-CH-10004338FurnitureChairsHon Bag Chairs, Red54.180030.40-32.52004.919MediumYes2
4244652013-06-232013-06-26First ClassKW-16435648a7c6f93ee0f453ee1378466a84ff8ConsumerLondonEnglandUnited KingdomNaNEUNorthOFF-ST-10001646Office SuppliesStorageFellowes Box, Wire Frame50.625030.1020.20508.570MediumNaN3
5302652013-05-232013-05-26First ClassFC-14245c7ee4888116b2f40fd3fa048c57f93c9Home OfficeSantiago de los CaballerosSantiagoDominican RepublicNaNLATAMCaribbeanOFF-AP-10001885Office SuppliesAppliancesHamilton Beach Coffee Grinder, White43.520020.20-7.08005.377HighNaN3
6380092014-10-312014-11-02Second ClassAJ-107805d7c7e88c8e01ea1ec06adaf52008919CorporateManaguaManaguaNicaraguaNaNLATAMCentralOFF-PA-10000108Office SuppliesPaperGreen Bar Parchment Paper, 8.5 x 1127.760020.0013.88002.737MediumNaN2
7402662013-11-112013-11-16Standard ClassBN-11515583495e45655d0533f2d2b772d823971ConsumerHanoiThủ Dô Hà NộiVietnamNaNAPACSoutheast AsiaOFF-LA-10002992Office SuppliesLabelsNovimex Removable Labels, 5000 Label Set32.768440.17-6.35162.180MediumYes5
8248712013-02-272013-03-04Standard ClassCC-125500515ed679a66bff59a161a28317b6bd4ConsumerBroken HillNew South WalesAustraliaNaNAPACOceaniaOFF-ST-10004015Office SuppliesStorageSmead Trays, Blue130.896030.1056.64608.300MediumNaN5
9120012012-06-252012-06-30Second ClassJE-1547545d82b7ca3728955400b9b342ec412dcConsumerLa Seyne-sur-MerProvence-Alpes-Côte d'AzurFranceNaNEUCentralTEC-AC-10004883TechnologyAccessoriesEnermax Keyboard, Programmable254.880030.00109.530026.590MediumNaN5

Last rows

df_indexOrder DateShip DateShip ModeCustomer IDCustomer NameSegmentCityStateCountryPostal CodeMarketRegionProduct IDCategorySub-CategoryProduct NameSalesQuantityDiscountProfitShipping CostOrder PriorityReturnedship_delay
38461471912014-12-042014-12-08Standard ClassKH-166302e326a96a0a174fbabf7b2153c86a3c6CorporatePanama CityPanamaPanamaNaNLATAMCentralOFF-FA-10001700Office SuppliesFastenersAccos Rubber Bands, Bulk Pack6.6720010.400-0.128000.862HighNaN4
38462219622014-07-112014-07-18Standard ClassGM-14695d4d0b11cd9b34e92c4a1dda4340de9f9CorporateMelbourneVictoriaAustraliaNaNAPACOceaniaFUR-FU-10004503FurnitureFurnishingsTenex Photo Frame, Black139.9680030.10043.4880010.590MediumNaN7
38463371942013-05-252013-05-31Standard ClassGM-44403d1c57189f23bee3e938826f6556219cConsumerBeni SuefBani SuwayfEgyptNaNAfricaAfricaOFF-CAR-10002031Office SuppliesBindersCardinal 3-Hole Punch, Durable121.2000040.00055.680002.960MediumNaN6
38464168502012-06-082012-06-13Second ClassJG-15160c6c64d1801f997e4e3628ed416cd160eConsumerWidnesEnglandUnited KingdomNaNEUNorthOFF-AR-10003384Office SuppliesArtBoston Pens, Water Color104.4000060.0009.3600016.560HighNaN5
3846562652011-05-102011-05-10Same DayBW-111102d806890acc865414ad191e4f11ec62aCorporateBarcelonaCataloniaSpainNaNEUSouthTEC-MA-10003078TechnologyMachinesEpson Printer, White469.8540020.100-31.3260054.130CriticalNaN0
38466112842014-07-102014-07-14Second ClassPS-18970c90d076ff45727789cb1742f443028e1Home OfficePetapaGuatemalaGuatemalaNaNLATAMCentralFUR-BO-10001483FurnitureBookcasesBush Corner Shelving, Metal246.9000030.00032.0400028.644MediumNaN4
38467447322014-11-262014-12-02Standard ClassCK-122058fe3138a7ef91d7f8635f63b9d5331adConsumerPanama CityPanamaPanamaNaNLATAMCentralOFF-LA-10002015Office SuppliesLabelsHon Round Labels, Alphabetical15.5520060.4001.992001.281MediumNaN6
38468381582011-10-142011-10-18Second ClassLR-17035a916b8bb7b9fcce602d0808e2eef7979CorporateAgraUttar PradeshIndiaNaNAPACCentral AsiaOFF-LA-10004894Office SuppliesLabelsHon Shipping Labels, Alphabetical44.7600040.00020.040002.690HighNaN4
384698602012-11-062012-11-08First ClassNW-184002b29848d9cbad1e31f5cc583c49922cbConsumerSan Luis PotosíSan Luis PotosíMexicoNaNLATAMNorthTEC-CO-10002009TechnologyCopiersBrother Wireless Fax, High-Speed1003.3492840.002178.94928219.533CriticalYes2
38470157952014-10-282014-10-28Same DayPV-18985c734b7f250b798431a1d83f7b585c499Home OfficeFrancaSão PauloBrazilNaNLATAMSouthOFF-PA-10002725Office SuppliesPaperEaton Cards & Envelopes, Premium60.2400020.00018.0400018.306CriticalNaN0